



[Subscribe \(Full Service\)](#) [Register \(Limited Service\)](#)  
**Search:**  The ACM Digital Library  The  
[+vector +dsp +loop +register simd simdd](#)

[Feedback](#) [Report a problem](#)

Published since January 1990 and Published before October 2003

Terms used **vector** **dsp** **loop** **register** **simd** **simdd**

Sort results by

relevance

Save results to a Binder

Try an [Advanced search](#)

Display results

expanded form

Open results in a new window

Try this search

Results 1 - 20 of 200

Result page: [1](#) [2](#) [3](#) [4](#) [5](#) [6](#) [7](#) [8](#) [9](#) [10](#) [next >](#)

Best 200 shown

[Results](#)

**1** [Energy aware compilation for DSPs with SIMD instructions](#)

Markus Lorenz, Lars Wehmeyer, Thorsten Dräger

June 2002 **ACM SIGPLAN Notices**, Proceedings of the joint conference on compilers and tools for embedded systems: software and hardware for embedded systems LCTES/SCOPES '02, Volume 37 Issue 7

**Publisher:** ACM Press

Full text available: [pdf\(220.77 KB\)](#) Additional Information: [full citation](#), [abstract](#), [citations](#), [index terms](#)

The growing use of digital signal processors (DSPs) in embedded systems motivates the use of optimizing compilers supporting special hardware features. In this paper, we present compiler optimizations with the aim of minimizing energy consumption for DSP applications. This comprises loop optimizations for exploitation of SIMD instructions and zero overhead hardware loops in order to increase performance and decrease energy consumption. In addition, we use a phase coupled code generator ...

**Keywords:** DSP, SIMD instruction, energy minimization, vectorization, hardware loop

**2** [Exploiting SIMD parallelism in DSP and multimedia algorithms using the...](#)

◆ Huy Nguyen, Lizy Kurian John

May 1999 **Proceedings of the 13th international conference on Superco**  
**Publisher:** ACM Press

Full text available: [!\[\]\(0f848bbd71cef6b345273b16f905912a\_img.jpg\) pdf\(1.16 MB\)](#)

Additional Information: [full citation](#), [refe](#)  
[index terms](#)

3 [Simulation and architecture evaluation: Vector vs. superscalar and VLIW a](#)  
[embedded multimedia benchmarks](#)

Christoforos Kozyrakis, David Patterson

November 2002 **Proceedings of the 35th annual ACM/IEEE internatio**  
**Microarchitecture MICRO 35**

**Publisher:** IEEE Computer Society Press

Full text available: [!\[\]\(c50c8b7b2cc2cf9ff925edec0ee94c0d\_img.jpg\) pdf\(1.34 MB\)](#)

Additional Information: [full citation](#), [abst](#)  
[citing](#), [index ter](#)

Multimedia processing on embedded devices requires an architecture tha  
performance, low power consumption, reduced design complexity, and s  
this paper, we use EEMBC, an industrial benchmark suite, to compare th  
architecture to superscalar and VLIW processors for embedded multimedia.  
comparison covers the VIRAM instruction set, vectorizing compiler, and  
that integrates a vector processor with DRAM main memory. We de ...

4 [MOM: a matrix SIMD instruction set architecture for multimedia applicati](#)

◆ Jesus Corbal, Roger Espasa, Mateo Valero

January 1999 **Proceedings of the 1999 ACM/IEEE conference on Super**  
**(CDROM) Supercomputing '99**

**Publisher:** ACM Press

Full text available: [!\[\]\(eabd9f9ababee93effadc3b380fe65fd\_img.jpg\) pdf\(116.12 KB\)](#)

Additional Information: [full citation](#), [refe](#)

5 [Register file and memory system design: Three-dimensional memory vect](#)  
[bandwidth media memory systems](#)

Jesus Corbal, Roger Espasa, Mateo Valero

November 2002 **Proceedings of the 35th annual ACM/IEEE internatio**

**Microarchitecture MICRO 35****Publisher:** IEEE Computer Society PressFull text available:  [pdf\(1.29 MB\)](#)Additional Information: [full citation, abst](#)  
[index terms](#)[Publisher Site](#)

Vector processors have good performance, cost and adaptability when to applications. However, for a significant number of media programs, configurations fail to deliver enough memory references per cycle to feed functional units. This paper addresses the problem of the memory bandwidth novel mechanism suitable for 2-dimensional vector architectures and targets high effective bandwidth for SIMD memory instructions. The basic ...

**6 MEDEA workshop: Indirect VLIW memory allocation for the ManArray** Nikos P. Pitsianis, Gerald G. PechanekMarch 2003 **ACM SIGARCH Computer Architecture News**, Volume 3**Publisher:** ACM PressFull text available:  [pdf\(623.03 KB\)](#) Additional Information: [full citation, abst](#)  
[index terms](#)

The indirect very long instruction word (iVLIW) architecture and its implementation BOPS ManArray family of multiprocessor digital signal processors (DSP) is a scalable alternative to the wide instruction busses usually required in a narrow VLIW DSP. The ManArray processors indirectly access VLIWs from small sets of VLIWs localized in each processing element. With this work, we present how to perform 1) iVLIW instruction memory allocation on multiple processing elements.

**7 Array recovery and high-level transformations for DSP applications** Björn Franke, Michael O'boyleMay 2003 **ACM Transactions on Embedded Computing Systems (TEC)**  
2**Publisher:** ACM PressFull text available:  [pdf\(744.35 KB\)](#) Additional Information: [full citation, abst](#)  
[citations, index terms](#)

Efficient implementation of DSP applications is critical for many embedded systems. Optimizing compilers for application programs, written in C, largely focus on code generation and scheduling, which, with their growing maturity, are providing better returns. As DSP applications typically make extensive use of pointer arithmetic, the compiler needs to generate efficient code for pointer arithmetic.

alternative use of high-level, source-to-source, transformations has been  
This article develops an array recovery technique that automatically ...

**Keywords:** Pointer conversion, dataflow graphs, embedded processors, ]  
transformations

**8** Regular contributions: DSP architectures: past, present and futures

Edwin J. Tan, Wendi B. Heinzelman

June 2003 **ACM SIGARCH Computer Architecture News**, Volume 31

**Publisher:** ACM Press

Full text available:  [pdf\(1.27 MB\)](#)

Additional Information: [full citation](#), [abst](#)

As far as the future of communication is concerned, we have seen that th  
for audio and video data to complement text. Digital signal processing (I  
that enables traditionally analog audio and video signals to be processed  
transmission, storage, reproduction and manipulation. In this paper, we v  
various DSP architectures and its silicon implementation. We will also d  
the art and examine the issues pertaining to pe ...

**9** Code selection for media processors with SIMD instructions

Rainer Leupers

January 2000 **Proceedings of the conference on Design, automation and  
DATE '00**

**Publisher:** ACM Press

Full text available:  [pdf\(147.30 KB\)](#)

Additional Information: [full citation](#), [refe](#)  
[index terms](#)

**10** Compilers and Optimization: An empirical evaluation of high level transfo  
embedded processors

Björn Franke, Michael O'Boyle

November 2001 **Proceedings of the 2001 international conference on C  
architecture, and synthesis for embedded systems CAS**

**Publisher:** ACM Press

Full text available: [pdf\(499.08 KB\)](#) Additional Information: [full citation, abstract](#) [index terms](#)

Efficient implementation of DSP applications are critical for many embedded systems. Optimising compilers for application programs written in C, largely focus on code generation and scheduling which, with their growing maturity, are providing high performance returns. This paper empirically evaluates another approach, namely high level source transformations. High level techniques were applied to the DSPs running on 3 platforms: TriMedia TM-1000, Texas Instruments TMS320C6201 and

**11 Evaluating MMX technology using DSP and multimedia applications**

Ravi Bhargava, Lizy K. John, Brian L. Evans, Ramesh Radhakrishnan

November 1998 **Proceedings of the 31st annual ACM/IEEE international conference on Microarchitecture MICRO 31**

**Publisher:** IEEE Computer Society Press

Full text available: [pdf\(1.52 MB\)](#) Additional Information: [full citation, references](#) [index terms](#)

**Keywords:** MMX, digital signal processing, machine measurement, performance monitoring, workload characterization

**12 Exploiting a new level of DLP in multimedia applications**

Jesus Corbal, Mateo Valero, Roger Espasa

November 1999 **Proceedings of the 32nd annual ACM/IEEE international conference on Microarchitecture MICRO 32**

**Publisher:** IEEE Computer Society

Full text available: [pdf\(931.68 KB\)](#) Additional Information: [full citation, abstract](#) [citations, index terms](#)  
[Publisher Site](#)

This paper proposes and evaluates MOM: a novel ISA paradigm targeted at multimedia applications. By fusing conventional vector ISA approaches together with SIMD-like (Single Instruction Multiple Data) ISAs (such as MMX), we propose a new matrix oriented ISA which efficiently deals with the small matrix structures found in multimedia applications. MOM exploits a level of DLP not reached by conventional vector ISAs nor SIMD-like media ISA extensions ...

**13 HIBRID-SOC: a multi-core architecture for image and video applications**

✉ S. Moch, M. Bereković, H. J. Stolberg, L. Fribe, M. B. Kulaczewski, A. L. September 2003 **ACM SIGARCH Computer Architecture News**, Proceedings of the workshop on MEmory performance: DEaling with Applications and architecture MEDEA '03, Volume 32 Issue 3

**Publisher:** ACM Press

Full text available: [!\[\]\(950a62bbddad88d64435fd35607dfc42\_img.jpg\) pdf\(245.38 KB\)](#) Additional Information: [full citation](#), [abstract](#)

The HiBRID-SoC multi-core architecture targets a wide range of applications, particularly high processing demands, including general signal processing, video de-encoding, image processing, or a combination of these tasks. The HiBRID-SoC integrates three fully programmable processor cores and memory on a single chip, all tied to a 64-Bit AMBA AHB bus. Its memory subsystem is adapted to the high bandwidth demands of the multi-core architecture ...

**14 Graph-based code selection techniques for embedded processors**

✉ October 2000 **ACM Transactions on Design Automation of Electronic Systems (TODAES)**, Volume 5 Issue 4

**Publisher:** ACM Press

Full text available: [!\[\]\(d5d7044e5caf6907399af2dced8d6ff8\_img.jpg\) pdf\(356.83 KB\)](#) Additional Information: [full citation](#), [abstract](#), [index terms](#), [review](#)

Code selection is an important task in code generation for programmable processors. The goal is to find an efficient mapping of machine-independent intermediate processor-specific machine instructions. Traditional approaches to code selection are based on tree parsing which enables fast and optimal code selection for intermediate data-flow graphs. While this approach is generally useful in compilers for general-purpose processors, it may lead to poor code ...

**Keywords:** SIMD instructions, code selection, data-flow graphs, embedded systems, irregular data paths

**15 Compilers I: Affinity-based cluster assignment for unrolled loops**

✉ Gayathri Krishnamurthy, Elana D. Granston, Eric J. Stotzer June 2002 **Proceedings of the 16th international conference on Supercomputing**

**Publisher:** ACM Press

Full text available: [pdf\(633.13 KB\)](#) Additional Information: [full citation, abst](#) [citations, index ter](#)

To compete performance-wise, modern VLIW processors must have fast high instruction-level parallelism (ILP). Partitioning resources (functional registers) into clusters allows the processor to be clocked faster, but operating clusters can easily become a bottleneck. Increasing the number of functional units can increase the potential ILP, but only helps if the functional units can be kept busy. Other features, optimizations such as loop unrolling m ...

**Keywords:** VLIW architectures, affinity-based clustering (ABC) algorithm, assignment, homogeneous clusters, loop optimizations, loop scheduling, partitioned register files, software pipelining

**16** [Trident: a scalable architecture for scalar, vector, and matrix operations](#)

Mostafa I. Soliman, Stanislav G. Sedukhin

January 2002 **Australian Computer Science Communications , Proceed Asia-Pacific conference on Computer systems architecture**  
Volume 24 Issue 3

**Publisher:** Australian Computer Society, Inc., IEEE Computer Society Pre

Full text available: [pdf\(814.51 KB\)](#) Additional Information: [full citation, abst](#) [citations, index ter](#)

Within a few years it will be possible to integrate a billion transistors on this integration level, we propose using a high level ISA to express parallelism instead of using a huge transistor budget to dynamically extract it. Since data structures for a wide variety of applications are scalar, vector, and matrix, Trident processor extends the classical vector ISA with matrix operations. The processor consists of a set of parallel ...

**Keywords:** data parallelism, parallel processing, ring register file, scalar, vector/matrix processing

**17** [Low power DSP's for wireless communications \(embedded tutorial session\)](#)

 Ingrid Verbauwhede, Chris Nicol

August 2000 **Proceedings of the 2000 international symposium on Low**

**and design ISLPED '00****Publisher:** ACM PressFull text available: [pdf\(424.32 KB\)](#) Additional Information: [full citation, abst](#) [citations, index ter](#)

Wireless communications and more specifically, the fast growing penetration of mobile phones and cellular infrastructure are the major drivers for the development of programmable Digital Signal Processors (DSPs). In this tutorial, an overview of recent developments in DSP processor architectures, that makes them able to execute computationally intensive algorithms typically found in communications. DSP processors have adapted instruction sets, memory archi ...

**Keywords:** architectures, digital signal processing, programmable processor, communications**18 Improving 3D geometry transformations on a simultaneous multithreaded processor**June 2001 **Proceedings of the 15th international conference on Supercomputing****Publisher:** ACM PressFull text available: [pdf\(219.10 KB\)](#) Additional Information: [full citation, abst](#) [citations, index ter](#)

In this paper we evaluate the performance of an SMT processor used as a 3D geometry processor for a 3D polygonal rendering engine. To evaluate this approach, we implemented a parallel version of Mesa (a parallel version of Mesa) which parallelizes the geometry stage. We show that SMT is suitable for 3D geometry and we characterize the parallel geometry stage in term of memory hierarchy, which is the main bottleneck. We show that latency is not fully recovered by SMT; the use of L2 ...

**Keywords:** SIMD extensions, SMT, applications specific architectures, data prefetching, parallel rendering**19 HiBRID-SoC: A Multi-Core System-on-Chip Architecture for Multimedia Applications**

Hans-Joachim Stolberg, Mladen Berekovic, Lars Fribe, Soren Moch, Sebastian Mao, Mark B. Kulaczewski, Heiko Klusmann, Peter Pirsch

March 2003 **Proceedings of the conference on Design, Automation and**

**Designers' Forum - Volume 2 DATE '03****Publisher:** IEEE Computer SocietyFull text available:  [pdf\(307.90 KB\)](#) KB)Additional Information: [full citation](#), [abst](#)  
[Publisher Site](#)

The HiBRID-SoC multi-core system-on-chip targets a wide range of applications, particularly high processing demands, including general signal processing, video and audio de-/encoding, and a combination of these tasks. For this HiBRID-SoC integrates three fully programmable processor cores and memory onto a single chip, all tied to a 64-Bit AMBA AHB bus. The processor cores are optimized to the particular computational characteristics ...

**20 Polygon rendering on a stream architecture** John D. Owens, William J. Dally, Ujval J. Kapasi, Scott Rixner, Peter M. Hanrahan  
August 2000 **Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Graphics hardware HWWS '00****Publisher:** ACM PressFull text available:  [pdf\(161.65 KB\)](#) Additional Information: [full citation](#), [abst](#)  
[citations](#), [index terms](#)

The use of a programmable stream architecture in polygon rendering provides a mechanism to address the high performance needs of today's complex systems. The need for flexibility and programmability in the polygon rendering pipeline motivates a polygon rendering pipeline that maps into data streams and kernels that operate on them. This mapping is used to implement the polygon rendering pipeline on a programmable stream processor. We compare our results ...

**Keywords:** OpenGL, SIMD, graphics hardware, kernels, media processing, rendering, stream architecture, stream processing, streams

Results 1 - 20 of 200

Result page: [1](#) [2](#) [3](#) [4](#) [5](#) [6](#) [7](#) [8](#) [9](#) [10](#)

The ACM Portal is published by the Association for Computing Machinery  
ACM, Inc.

[Terms of Usage](#) [Privacy Policy](#) [Code of Ethics](#) [Contact Us](#)

Useful downloads: [Adobe Acrobat](#) [QuickTime](#) [Windows Media Player](#)



Welcome United States Patent and Trademark Office

Home | Log in

## □ Search Results

BROWSE

SEARCH

IEEE XPL

Results for "((vector simd )&lt;in&gt;metadata)) &lt;and&gt; (pyr &gt;= 1990 &lt;and&gt; pyr &lt;= 2003)"

Your search matched 5 of 1472243 documents.

A maximum of 100 results are displayed, 25 to a page, sorted by Relevance in Descending order.

## » Search Options

[View Session History](#)[New Search](#)

## » Key

**IEEE JNL** IEEE Journal or Magazine

**IEE JNL** IEE Journal or Magazine

**IEEE CNF** IEEE Conference Proceeding

**IEE CNF** IEE Conference Proceeding

**IEEE STD** IEEE Standard

## Modify Search

((vector simd )<in>metadata)) <and> (nvr >= 1990

Check to search only within this results set

Display Format:  Citation  Citation & Abstract

[view selected items](#)

[Select All](#) [Deselect All](#)

- 1. **A SIMD vectorizing compiler for digital signal processing algorithms**  
Franchetti, F.; Puschel, M.;  
Parallel and Distributed Processing Symposium., Proceedings International  
15-19 April 2002 Page(s):20 - 26  
Digital Object Identifier 10.1109/IPDPS.2002.1015494  
[AbstractPlus](#) | Full Text: [PDF\(381 KB\)](#) [IEEE CNF](#)  
[Rights and Permissions](#)
- 2. **A survey of parallel computer architectures**  
Duncan, R.;  
Computer  
Volume 23, Issue 2, Feb. 1990 Page(s):5 - 16  
Digital Object Identifier 10.1109/2.44900  
[AbstractPlus](#) | Full Text: [PDF\(800 KB\)](#) [IEEE JNL](#)  
[Rights and Permissions](#)
- 3. **Short vector code generation and adaptation for DSP algorithms**  
Franchetti, F.; Puschel, M.;  
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)  
Volume 2, 6-10 April 2003 Page(s):II - 537-40 vol.2  
Digital Object Identifier 10.1109/ICASSP.2003.1202422  
[AbstractPlus](#) | Full Text: [PDF\(366 KB\)](#) [IEEE CNF](#)  
[Rights and Permissions](#)
- 4. **Architecture independent short vector FFTs**  
Franchetti, F.; Karner, H.; Kral, S.; Ueberhuber, C.W.;  
Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01)  
Volume 2, 7-11 May 2001 Page(s):1109 - 1112 vol.2  
Digital Object Identifier 10.1109/ICASSP.2001.941115  
[AbstractPlus](#) | Full Text: [PDF\(312 KB\)](#) [IEEE CNF](#)  
[Rights and Permissions](#)
- 5. **OPSILA: a vector and parallel processor**  
Boeri, F.; Auguin, M.;  
Computers, IEEE Transactions on  
Volume 42, Issue 1, Jan. 1993 Page(s):76 - 82  
Digital Object Identifier 10.1109/12.192215  
[AbstractPlus](#) | Full Text: [PDF\(572 KB\)](#) [IEEE JNL](#)

[Rights and Permissions](#)



[Home](#) | [Login](#) | [Logout](#)

## Welcome United States Patent and Trademark Office

[BROWSE](#) [SEARCH](#) [IEEE GUND](#)

### Search Results

Results for "((vector simdd dsp)<in>metadata)) <and> (pyr >= 1990 <= 2003)"  
Your search matched 0 documents.  
A maximum of 100 results are displayed, 25 to a page, sorted by **Relevance** Descending order.

### » Search Options

[View Session History](#)[New Search](#)

### » Key

|                 |                            |
|-----------------|----------------------------|
| <b>IEEE JNL</b> | IEEE Journal or Magazine   |
| <b>IEE JNL</b>  | IEE Journal or Magazine    |
| <b>IEEE CNF</b> | IEEE Conference Proceeding |
| <b>IEE CNF</b>  | IEE Conference Proceeding  |
| <b>IEEE STD</b> | IEEE Standard              |

### Modify Search

[\(\(vector simdd dsp\)<in>metadata\)\) <and> \(pyr >= 1990 <= 2003\)](#)

Check to search only within this results set

**Display Format:**  Citation  Abstract

**No results were found.**

Please edit your search criteria and try again. Refer assistance revising your search.

[Home](#) | [Login](#) | [Logout](#)

## Welcome United States Patent and Trademark Office

**BROWSE SEARCH IEEE GUND**

### Search Results

Results for "(((vector simd dsp)<in>metadata)) <and> (pyr >= 1990 <and> 2003)"  
Your search matched 0 documents.  
A maximum of 100 results are displayed, 25 to a page, sorted by **Relevance Descending** order.

### » Search Options

[View Session History](#)[New Search](#)

### » Key

|                 |                            |
|-----------------|----------------------------|
| <b>IEEE JNL</b> | IEEE Journal or Magazine   |
| <b>IEE JNL</b>  | IEE Journal or Magazine    |
| <b>IEEE CNF</b> | IEEE Conference Proceeding |
| <b>IEE CNF</b>  | IEE Conference Proceeding  |
| <b>IEEE STD</b> | IEEE Standard              |

### Modify Search

[\(\(\(vector simd dsp\)<in>metadata\)\) <and> \(pyr >= 1990 <and> 2003\)](#)

Check to search only within this results set

**Display Format:**  Citation &  Abstract

**No results were found.**

Please edit your search criteria and try again. Refer assistance revising your search.



Welcome United States Patent and Trademark Office

Home | Log in

 Search Results

BROWSE

SEARCH

IEEE XPL

Results for "((vector access patterns)&lt;in&gt;metadata)) &lt;and&gt; (pyr &gt;= 1990 &lt;and&gt; pyr &lt;= 200...)"

Your search matched 1 of 1472243 documents.

A maximum of 100 results are displayed, 25 to a page, sorted by **Relevance** in **Descending** order.» **Search Options**[View Session History](#)

New Search

Modify Search

 **((vector access patterns)<in>metadata)) <and> (pyr >= 1990 <and> pyr <= 200...)"** Check to search only within this results setDisplay Format:  Citation  Citation & Abstract» **Key****IEEE JNL** IEEE Journal or Magazine**IEE JNL** IEE Journal or Magazine**IEEE CNF** IEEE Conference Proceeding**IEE CNF** IEE Conference Proceeding**IEEE STD** IEEE Standard [view selected items](#)[Select All](#) [Deselect All](#) 1. **Introducing a new cache design into vector computers**

Quing Yang;

[Computers, IEEE Transactions on](#)

Volume 42, Issue 12, Dec. 1993 Page(s):1411 - 1424

Digital Object Identifier 10.1109/12.260632

[AbstractPlus](#) | Full Text: [PDF\(1276 KB\)](#) [IEEE JNL](#)[Rights and Permissions](#)Indexed by  
 Inspec<sup>®</sup>